feat(echo): per-role echo weights + user-supplied token filter#2782
Merged
Conversation
Echo's selection surface generalizes from the observations="tool"|"all" binary to a role table: each env-provided message role (system / user / assistant / tool) trains at its own alpha, selected via the renderer's per-token attribution. An optional filter hook (import_path + kwargs, matching the custom advantage/loss precedent) narrows the selection per rollout with one keep-mask per trajectory step. - completion_obs_mask (bool) -> completion_obs_weights (float): the per-token weight carries its role's alpha, so stamping folds it into ce_weights directly and stamp_loss_routing drops the scalar observation_weight parameter. Orchestrator-internal as before. - The echo preset is unchanged in meaning: tool-response bodies at 0.1. Setting any role replaces the whole table. - Echo now always requires the renderer (role selection needs attribution); the blanket "all" mode is gone — assemble the roles you want instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #2746. Generalizes ECHO's token selection, porting the selection vocabulary from @snimu's #2677 onto the component weight streams.
What changes
Per-role echo weights. The
observations = "tool" | "all"binary becomes a role table — each env-provided message role trains at its own α, selected via the renderer's per-token attribution (message_indices/message_roles/is_content):The
echopreset is unchanged in meaning: tool-response bodies atalpha = 0.1. Presets stay atomic; setting any role replaces the whole table.User-supplied token filter. An optional hook narrows the role selection per rollout — e.g. dropping warning lines from tool output, or tokens the sampler found unlikely. Config matches the custom advantage/loss precedent (
import_path+kwargs); the signature is #2677's:The callable sees the raw rollout (message text, sampling logprobs), so content filters and sampling-probability filters need no further framework surface. Shape violations fail loudly. The filter can only narrow the selection, never widen it.
How it works
completion_obs_mask(bool, orchestrator-internal) becomescompletion_obs_weights(float): each selected token carries its role's α frominterleave_rollout, andstamp_loss_routingfolds the weights into thece_weightsstream directly — the scalarobservation_weightparameter is gone. Trainer untouched: per-component global normalization already keeps echo tokens out of the rl denominator, and a zero weight excludes a token from the ce component's numerator and denominator (true masking, not dilution).Breaking (pre-merge, no compat shims)
advantage.observation_weight/advantage.observationsare replaced byadvantage.roles.<role>.alpha.orchestrator.renderer(role selection needs attribution). The blanket"all"mode is gone — assemble the roles you want instead.Not in this PR
tool_namesfiltering (restrict echo to specific tool functions) — deferred; non-breaking optional field on the tool role later,message_tool_namesalready rides in the attribution.🤖 Generated with Claude Code
Note
Medium Risk
Breaking ECHO config and loss-routing wire fields change which env tokens get CE supervision; misconfigured roles or filters could silently drop observation training, though validation enforces renderer and filter shapes.
Overview
ECHO observation supervision is generalized from a single
observation_weight+observations = "tool" | "all"switch to a per-message-role table (roles.{system,user,assistant,tool}.alpha) and an optional user filter (import_path+kwargs).The
echopreset still means tool-response bodies atα = 0.1; defining any role replaces the entire role table. Orchestrator-internal tagging moves from booleancompletion_obs_maskto floatcompletion_obs_weights(per-token α), folded intoce_weightsinstamp_loss_routingwithout a global observation scalar.interleave_rolloutselects tokens via renderer attribution and can narrow them with validated per-step keep-masks. Echo always requiresorchestrator.renderer(no"all"escape hatch). Docs, debugecho.toml, and config skill text are updated for the breaking config shape.Reviewed by Cursor Bugbot for commit 509d6a3. Bugbot is set up for automated code reviews on this repo. Configure here.